Goto

Collaborating Authors

 inequality problem


Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities

Zhao, Haoyu, Geng, Yihan, Tang, Shange, Lin, Yong, Lyu, Bohan, Lin, Hongzhou, Jin, Chi, Arora, Sanjeev

arXiv.org Artificial Intelligence

LLM-based formal proof assistants (e.g., in Lean) hold great promise for automating mathematical discovery. But beyond syntactic correctness, do these systems truly understand mathematical structure as humans do? We investigate this question in context of mathematical inequalities -- specifically the prover's ability to recognize that the given problem simplifies by applying a known inequality such as AM/GM. Specifically, we are interested in their ability to do this in a compositional setting where multiple inequalities must be applied as part of a solution. We introduce Ineq-Comp, a benchmark built from elementary inequalities through systematic transformations, including variable duplication, algebraic rewriting, and multi-step composition. Although these problems remain easy for humans, we find that most provers -- including Goedel, STP, and Kimina-7B -- struggle significantly. DeepSeek-Prover-V2-7B shows relative robustness, but still suffers a 20% performance drop (pass@32). Even for DeepSeek-Prover-V2-671B model, the gap between compositional variants and seed problems exists, implying that simply scaling up the model size alone does not fully solve the compositional weakness. Strikingly, performance remains poor for all models even when formal proofs of the constituent parts are provided in context, revealing that the source of weakness is indeed in compositional reasoning. Our results expose a persisting gap between the generalization behavior of current AI provers and human mathematical intuition. All data and evaluation code can be found at https://github.com/haoyuzhao123/LeanIneqComp.


How to solve AI's inequality problem

MIT Technology Review

His 2014 book, coauthored with Andrew McAfee, is called The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. But he says the thinking of AI researchers has been too limited. "I talk to many researchers, and they say: 'Our job is to make a machine that is like a human.' It's a clear vision," he says. But, he adds, "it's also kind of a lazy, low bar.'"


Requests for Research 2.0

#artificialintelligence

If you're not sure where to begin, here are some solved starter problems. Train an LSTM to solve the XOR problem: that is, given a sequence of bits, determine its parity. The LSTM should consume the sequence, one bit at a time, and then output the correct answer at the sequence's end.